Privacy-Preserving Statistical Data Analysis on Federated Databases
نویسندگان
چکیده
The quality of empirical statistical studies is tightly related to the quality and amount of source data available. However, it is often hard to collect data from several sources due to privacy requirements or a lack of trust. In this paper, we propose a novel way to combine secure multi-party computation technology with federated database systems to preserve privacy in statistical studies that combine and analyse data from multiple databases. We describe an implementation on two real-world platforms—the Sharemind secure multi-party computation and the X-Road database federation platform. Our solution enables the privacy-preserving linking and analysis of databases belonging to different institutions. Indeed, a preliminary analysis from the Estonian Data Protection Inspectorate suggests that the correct implementation of our solution ensures that no personally identifiable information is processed in such studies. Therefore, our proposed solution can potentially reduce the costs of conducting statistical studies on shared data.
منابع مشابه
Performance Analysis of Privacy Preserving Naïve Bayes Classifiers for Distributed Databases
The problem of secure and fast distributed classification is an important one. The main focus of the paper is on privacy preserving distributed classification rule mining. This research paper addresses the performance analysis of privacy preserving Naïve Bayes classifiers for horizontal and vertical partitioned databases. The Naïve Bayes classifier is a simple but efficient baseline classifier....
متن کاملPrivacy Preserving Linear Regression on Distributed Databases
Studies that combine data from multiple sources can tremendously improve the outcome of the statistical analysis. However, combining data from these various sources for analysis poses privacy risks. A number of protocols have been proposed in the literature to address the privacy concerns; however they do not fully deliver on either privacy or complexity. In this paper, we present a (theoretica...
متن کاملPrivacy-preserving GWAS analysis on federated genomic datasets
BACKGROUND The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS). However, high quality GWAS usually requires a large amount of samples, which can grow beyond the capability of a single institution. Federated genomic data analysis holds the promise of enabling cross-institution c...
متن کاملAutomatic Compliance of Privacy Policies in Federated Digital Identity
Privacy [13] in the digital world is an important problem which is becoming even more pressing as new collaborative applications are developed. The lack of privacy preserving mechanisms is particularly problematic in federated identity management contexts. In such a context, users can seamlessly interact with a variety of federated web services, through the use of single-sign-on mechanisms and ...
متن کاملSAFETY: Secure gwAs in Federated Environment Through a hYbrid solution with Intel SGX and Homomorphic Encryption
Recent studies demonstrate that effective healthcare can benefit from using the human genomic information. For instance, analysis of tumor genomes has revealed 140 genes whose mutations contribute to cancer 1. As a result, many institutions are using statistical analysis of genomic data, which are mostly based on genome-wide association studies (GWAS). GWAS analyze genome sequence variations in...
متن کامل